
cd "C:\Users\lenovo\Desktop\WongLiangPOBE\WongLiangStudy2\WongLiangStudy2multistate"

use "C:\Users\lenovo\Desktop\WongLiangPOBE\WongLiangStudy2\WongLiangStudy2multistate\WongLiangStudy2multistate.dta" , clear
//wide dataset 
drop if treatment == 0

*================================
* Divide data into three subsets: each one with a starting state (recommended comments, netizens comments, skip)
* We are interested in those who start with recommended comments, so only the first dataset df1 will be used.
*================================

* starting with recommended comments
preserve 
keep if  time1choice==1 
save df1,replace 
restore 

*-------------------------------------------------------------------------------------
* Starting with recommended comments, through skip comments, end with netizens comments
* Note that the variable names used in the dataset will be:
* positive = recommended comments; negative = netizens comments; and ignore = skip comments
*-------------------------------------------------------------------------------------

*================================
* Generate netizens comments state
*===============================
cd "C:\Users\lenovo\Desktop\WongLiangPOBE\WongLiangStudy2\WongLiangStudy2multistate"
use df1,clear 

* Keep only id, time*choice(1-6) and time(1-6) to calculate time duration
keep id time*choice time1 time2 time3 time4 time5 time6

* Rename time*choice to choice
rename time*choice choice*

* Transfer long-data to wide-data 
reshape long choice time,i(id) j(order)

* Generate backup variables
gen choice1=choice
gen time1=time 

* Get ride of those aren't netizens comments
recode choice1 (1 3=.)

* To generate netizens comments state through accumulated value.  If accumulated value equals to choice value which is 2 in this case, then keep it. 
* Because we are only interested in the first happening time of netizens comments stat, so if accumulated value is larger than 2, it happens twice or more.
* If accumulated value is 0, then assign a value of 0 to it. 

bys id:gen addchoice1=sum(choice1) //calculate accumulated value

gen negative=0 //no negtive (yet) by default
replace  negative=1 if addchoice1==2&choice==2  

* Keep it as 1
keep if negative==1
keep id order negative 

* Generate the variable of "happening time of netizens commnts state" 
* Where "netizens" = 1, the corresponding value in "order" is the happening time of netizens comments state.
rename order t_negative 
save o1df1,replace 

*==============================
* Repeat the process 
* Generate skip comments state
*==============================

* Generate skip comments state
use df1,clear 
* Keep only id, time*choice(1-6) and time(1-6) to calculate time duration

keep id time*choice time1 time2 time3 time4 time5 time6

* Rename time*choice to choice*
rename time*choice choice*

* Transfer long-data to wide-data 
reshape long choice time,i(id) j(order)

* Generate backup variables
gen choice1=choice 
gen time1=time 

* Get rid of those aren't skip comments state
recode choice1 (1 2=.)

* To generate skip comments state through accumulated value. If accumulated value equals to choice value which is 3 in this case, then keep it. 
* Because we are only interested in the first happening time of skip comments state, so if accumulated value is larger than 3, it happens twice or more.
* If accumulated value is 0, then assign a value of 0 to it. 

bys id:gen addchoice1=sum(choice1) //calculate accumulated value

gen ignore=0 
replace  ignore=1 if addchoice1==3&choice==3  

keep if ignore==1
keep id order ignore 

* generate happening time of ignore state
rename order t_ignore 

save o1df2,replace 

*=========================
* Combining two datasets
*=========================
use df1,clear 
merge 1:1 id using o1df1
drop _merge
merge 1:1 id using o1df2
drop _merge

* Recode missing value of negative and ignore to 0
recode negative ignore (.=0)

* Recode missing values in "happening time of ignore (t_ignore)" and "happening time of negative (t_negative)" to 6. The corresponding value at row 
* "ignore" and "negative" should be 5, meaning that until number 6 policy, there is no occurance of "negative" nor "ignore" 

* t_ignore and t_negative mean that at WHICH time point "ignore" or "negative" happens. 
* now each variable minues 1 to change the meaning to "when does ignore or negative happens after HOW MANY transitions of comment exposure"
* for instance, if t_negative = 4, and corresponding negative = 0, that means after four transitions, nothing happened. 
* if t_negative = 3 and negative = 1, then it means after three transitions, that subject turns to negative state. 

recode t_ignore t_negative (.=6)
replace t_ignore=t_ignore-1
replace t_negative=t_negative-1

save ndf1,replace 

* Now turn to do.file "WongLiangStudy2multistate_analysis.do"

